A national multiple baseline cohort study of mental health conditions in early adolescence and subsequent educational outcomes in New Zealand

Young people experiencing mental health conditions are vulnerable to poorer educational outcomes for many reasons, including: social exclusion, stigma, and limited in-school support. Using a near-complete New Zealand population administrative database, this prospective cohort study aimed to quantify differences in educational attainment (at ages 15–16 years) and school suspensions (over ages 13–16 years), between those with and without a prior mental health condition. The data included five student cohorts, each starting secondary school from 2013 to 2017 respectively (N = 272,901). Both internalising and externalising mental health conditions were examined. Overall, 6.8% had a mental health condition. Using adjusted modified Poisson regression analyses, those with prior mental health conditions exhibited lower rates of attainment (IRR 0.87, 95% CI 0.86–0.88) and higher rates of school suspensions (IRR 1.63, 95% CI 1.57–1.70) by age 15–16 years. Associations were stronger among those exhibiting behavioural conditions, compared to emotional conditions, in line with previous literature. These findings highlight the importance of support for young people experiencing mental health conditions at this crucial juncture in their educational pathway. While mental health conditions increase the likelihood of poorer educational outcomes, deleterious outcomes were not a necessary sequalae. In this study, most participants with mental health conditions had successful educational outcomes.

Data. The data used in this study is NZIDI, a repository of administrative and survey data which links records at the individual level from national data sources across a range of government and non-government agencies 36 . The NZIDI captures records of individuals' interactions with government agencies for health, social services, education, justice, geography, housing and economics (tax and income). Data from the 2013 Census are also included. The data sources in the NZIDI, such as routinely collected records from government agencies, are linked via established record-linkage methodologies. The primary goal of record-linkage is to link an entity (e.g., a person) from one file to the same entity in other file(s) 44,45 . The aim of the NZIDI methodology is to achieve a high linkage rate between data sources, while maintaining a false positive rate below 2% 46 . This is possible due to the granular nature of the variables which are used in the linkage. The core variables used for linking datasets are: First name, last name, sex, year of birth, month of birth, day of birth, which are supplemented by additional identifying variables as available in a given dataset. Access and use of the data are governed by SNZ 'Five Safes' approach statistical disclosure control 47 .
Study population. The base study population comprises an Ever Resident Population (ERP), which defines the resident population each year in the NZIDI 48 . For each year, individuals are included in the ERP if they have engaged with key services (e.g., tax, healthcare, education) in New Zealand during the two years preceding. Individuals are excluded if they have died or emigrated to another country before the end of that year. Moreover, individuals were excluded from the study population if they resided in another country for more than six months during the two year window for identifying mental health conditions (see the mental health measure sub-section) or more than one year outside of New Zealand during the education outcome window (see education measures sub-section) because in these cases we were unable to reliably measure these variables. Here, five successive cohorts of secondary school students were defined and extracted: those who entered their first year of secondary school (school Year 9, normally aged 13-14 years) in years 2013, 2014, 2015, 2016 or 2017. These years were selected to ensure consistent constituent data source coverage for the mental health condition identification 37 . Primary mental health measures. Mental health conditions were created using an established, published, case identification methodology 37 . The full details of the methodology are available in the relevant publication 37 , and we provide an outline of the salient features for the present paper as follows. In brief, this approach brings together records from multiple data sources within the NZIDI, in conjunction with clinical judgement, to identify and classify clinically relevant mental health conditions among children and young people. Data sources include the following Ministry of Health national collections: the Programme for the integration of mental health data (PRIMHD), a national collection of publicly funded specialist mental health service use (service-use and diagnoses); the National Minimum Dataset (NMDS), a national collection of publicly funded hospital admissions (service use and diagnoses); Socrates, a national database of the Ministry of Health's Disability Support Service clients and service providers; and, the Pharmaceutical collection, claims and payment information from pharmacists for government-subsided medication dispensing. The authors of this case identification method employed a two-staged approach to classify 13 mental health conditions. First, a shortlist of thirteen mental health conditions was derived by a team of clinical experts (anxiety, depression, emotional problems (indeterminant anxiety or depression), bipolar disorders, substance problems, eating problems, disruptive behaviours, psychosis, personality disorders, sleep problems, self-harm, other mental health problems, and mental health not defined). In the second stage, a panel of eight specialists, including a clinical psychologist, four child and adolescent psychiatrists, and three researchers in child and adolescent mental health, independently assigned a combination of diagnostic codes (ICD-10-AM and DSM-4 codes from PRIMHD, ICD-10-AM codes from NMDS, and assigned diagnosis codes upon referral from Socrates), service-use activity (PRIMHD) and inferences from medication prescribing (Pharmaceutical collection) to classify mental health conditions (see supplementary material for details). Disagreement of code assignment was resolved through discussion and consensus. The case identification method was not formally validated. The codes lists for each grouping are reported in Supplementary Materials Table A1. www.nature.com/scientificreports/ For the present study, we created three exposure variables, identifying mental health conditions from records during the 2 years prior (approximately aged 11-13 years) to starting secondary school (to reduce the possibility of reverse causality). The first variable, 'emotional conditions' , consisted of anxiety, depression, and indeterminant anxiety or depression. The second, termed 'behavioural conditions' , comprised attention-deficit/hyperactivity disorder (ADHD), conduct problems, and oppositional defiant disorder (denoted as disruptive behaviours in the original case identification method). The final variable was a composite indicator of any mental health condition and included emotional and behavioural conditions, in addition to the smaller subgroups of psychosis, substance problems, sleep problems, eating disorders, self-harm, other mental health, and mental health not defined (bipolar disorders and personality disorders were excluded as these pertain to young people older than those included in the present study). The three exposure variables are not mutually exclusive so individuals may be identified with multiple mental health exposure groups.
Sociodemographic and potentially confounding variables. Sociodemographic variables measured at baseline included child's sex (male, female), age-at-school-entry (years), and ethnicity. These measures are sourced from multiple collections including Census data, birth records, health, and education data. The measures are generally self-reported. Here ethnicity comprised five groupings: Māori, Pacific peoples, Asian, Middle Eastern/Latin American/African (MELAA), and European/Other. This variable was classified using the total response approach, whereby categories are not mutually exclusive and individuals can belong to multiple ethnic groups. Māori represent the descendants of indigenous inhabitants of New Zealand, while the other groups refer to migrants, or descendants of original migrants, from the regions after which they are named. Additionally, highest level of parental education attainment, household income, level of deprivation, and residential location variables were used. Taken from the 2013 Census, highest level of parental education attainment (from either parent) was categorised into five groups: not reported, no qualification, secondary school qualification, post-secondary school qualification, and university level. Household income (NZD), also drawn from the 2013 Census, was banded into the following categories ≤ $25,000; $25,001-$50,000; $50,001-$100,000; $100,101-$150,000; ≥ $150,001 (median household income was $63,800 in 2013) 49 . Deprivation was measured using the New Zealand Deprivation Index (NZDep) 50 . This small-area (around 30 households) deprivation measure combines Census information on income, employment, qualifications, communication, support, living space, transport and home ownership into a single continuous measure. This was partitioned into quintiles, with one representing the least deprived households and five representing those with the greatest level of deprivation. Finally, residential location used Statistics New Zealand (SNZ) 2016 urban/rural classification has five categories: (1) main urban (population of at least 30,000), (2) secondary urban (population 10,000-29,999), (3) minor urban (population 1000-9999), (4) rural centre (population 300-999) and (5) other rural (population < 300). These were collapsed into two groups to create a binary indicator: urban (main urban, secondary urban and minor urban area) and rural (rural centre and other rural).

Statistical analyses.
The reporting in this study follows the Reporting of studies Conducted using Observational Routinely-collected Data (RECORD) guidelines 51 . After describing the participant flow and their characteristics, complete-case multilevel mixed-effect modified Poisson regressions were used to relate mental health indication to educational outcomes, namely NCEA Level 1 Certificate achievement and stand-down or suspension indication 52 . Modified Poisson regressions, with robust variance estimators, were chosen to provide direct estimates of incidence rate ratio's (IRR's) because the rates of educational outcome were not rare 53 , and conventional logistic regression models produce odds ratios which are inflated estimates of IRRs. School-level random effects were included to account for intra-school correlation. Three model specifications were investigated, each sequentially adding a new set of potential confounders. Model 0 reported unadjusted IRR's, which gives the risk of experiencing an educational outcome if a student has a prior mental health condition compared to students with no prior mental health condition, adjusting only for school cohort year. Model 1 reported adjusted IRRs by adding participants' demographic variables (namely: sex, age-at-school-entry and ethnicity); Model 2 further adds potential socioeconomic confounders (namely: highest parental education level, household income bands, quintiles of area-deprivation and residential location). Because previous studies have found differential associations between mental health conditions and education-related outcomes by sex, we estimated all models separately by sex 54 , with results reported in Supplementary Materials. Data were extracted using SAS 7.1 (SAS Institute Inc., 2014) and analysed using Stata MP version 16.0 (StataCorp, 2019), and two-tailed α = 0.05 defined significance. To assess the model's predictive power, the c-statistic was employed. This gives the probability a randomly selected participant who experienced an exposure (e.g., mental health indication) had a higher risk score than a participant who had not experienced the exposure. Model fit is typically considered reasonable when the c-statistic is higher than 0.7 and strong when the c-statistic exceeds 0.8 55 . Approvals and ethics. This study is a secondary analysis of routinely-collected de-identified administrative information housed and curated by Statistics New Zealand. The NZIDI is designed to be a research database that holds de-identified administrative records about people and households that come from government agencies, Statistics New Zealand surveys, and non-government organisations. Access to the data was provided by Statistics New Zealand under conditions designed to give effect to the security and confidentiality provisions of the Statistics Act 1975 38 . The data that support the findings of this study are available from Statistics New Zealand, but restrictions apply to the availability of these data which were used under license for the current study, and so are not publicly available. The data and code used for the purpose of the study are, however, available upon reasonable request from Statistics New Zealand. The data can only be accessed by approved bone fide researchers, for projects that are in the public interest and within a secure accredited data lab (see: https:// www. www.nature.com/scientificreports/ stats. govt. nz/ integ rated-data/ apply-to-use-micro data-for-resea rch). Statistics New Zealand use the 'Five Safes' and 'Nga Tikanga Paihere' frameworks to manage safe access to the information. Further description is given by: https:// www. stats. govt. nz/ integ rated-data/ integ rated-data-infra struc ture/ your-infor mation-in-the-idi/. The study proposal and protocols were approved by Statistics New Zealand (MAA2017-16). Based on New Zealand's Health and Disability Ethics Committees' (HDEC) checklist, the study did not meet the threshold required for formal ethics review. The University of Otago Human Research Ethics Committee reviewed the study for ethics consideration. The study was reviewed as a 'Minimal Risk Health Research-Audit and Audit related studies' proposal and was approved (reference: HD17/004). Informed consent from participants was not deemed necessary according to national legislation, i.e., the Statistics Act 1975 38 . All methods and reported results were carried out in accordance with relevant Statistics New Zealand and HDEC guidelines and regulations, and only includes aggregated randomly rounded to base 3 de-identified data. Random rounding to base 3 (RR3) involves randomly changing each count in a table to a multiple of 3. This is required by Statistics New Zealand in order to meet data confidentiality requirements.

Results
Participants. Over the 2013-2017 study baseline periods, 286,176 pupils who met the ERP criteria were enrolled in school Year 9. However, 9549 pupils were excluded as they were in another country during the 2-year mental health condition identification period, as were 3471 pupils who were in another country for at least 12 months during the follow-up period (school Years 9-11), and 255 who were aged under 12 years or over 15 years at the beginning of secondary school Year 9. This left a final analytical sample, who had data on prior mental health conditions and both educational outcomes, of n = 272,901 (95.4%). Identification of mental health conditions. Overall, 2286 (0.8%) participants were identified as having a behavioural mental health condition, 10,635 (3.9%) with an emotional mental health condition, and 18,603 (6.8%) participants with a mental health condition of any type. Table 1 also presents the distribution of sociodemographic variables for those with any, behavioural, or emotional conditions. It is evident from Table 1 that several patterns emerge, supporting the confounding relationships between mental health conditions and sociodemographic variables. For example, males are over-represented among those with a behavioural mental health condition (54.4% were male), whereas the sex split is less pronounced for emotional mental health conditions (51.2% were male). Social patterning is evident, particularly for area deprivation. Those living in more deprived areas are over-represented among those with behavioural conditions (27.7% resided in the most deprived quintile of areas), but less so for any mental health or emotional conditions (21.0% resided in the most deprived quintile of areas).

Educational outcomes.
Overall, by the end of Year 11, 205,638 (75.4%) of participants gained their NCEA Level 1 Certificate. However, over their Year 9-11 school tenure, 26,748 (9.8%) participants also had at least one period of stand-down or suspension. The sociodemographic distribution of the sample conditional on experiencing these educational outcomes, is presented in Table 2. Again, differential patterns emerge. While the proportion gaining NCEA Level 1 Certificate attainment was similar by sex (51.7% were female), females were under-represented among those being stood-down or suspended (34.6% were female). In contrast, Māori and Pacific peoples, and those living in more deprived areas, were under-represented among those with NCEA Level 1 Certificate attainment (among those gaining NCEA Level 1, 17.4% lived in the most deprived quintile of areas) and over-represented among those experiencing stand-down and suspension (among those who were stooddown or suspended, 40.9% lived in the most deprived quintile of areas).  Table A2). Similarly, among those with any mental health condition 16.2% were stood-down or suspended, compared to 9.3% among those with no mental health condition, hence the overwhelming majority of those with a mental health indication were neither stood-down nor suspended. This difference was also observed among boys and girls, with a larger difference for boys (21.0% were suspended among boys with any mental health condition, compared to 11.9% of boys with no mental health condition, the figures for girls being 10.6% and 6.7% respectively; see Supplementary Materials, Table A2). When examining behavioural and emotional mental health conditions separately, stand-down and suspension rates were higher (32.4%), and NCEA Level 1 Certificate attainment lower (41.9%), among those experiencing behavioural conditions compared to the full population (9.8% and 75.4%, respectively), whereas those experiencing emotional health conditions had more similar outcomes to the full study population (10.8% and 72.8%, respectively); see Table 3. These patterns were consistent among both boys and girls (Supplementary Materials, Table A2). However, the extent to which these patterns may be explained by confounding from sociodemographic characteristics observed in Tables 1 and 2 Table A2). The Table 1. Sociodemographic distribution of the full sample (n = 272,901), and among those with any (n = 18,603, 6.8%), behavioural (n = 2,286, 0.8%), or emotional (n = 10,635, 3.9%) mental health condition. Data source is the NZ-IDI. Counts and column percentages for ethnicity do not add to 100%, because the total response ethnicity coding is used where categories are not mutually exclusive-an individual may belong to more than one ethnic group; 1.5% of the sample were missing for deprivation; 1.4% were missing the urban/ rural indicator; 0.01% were missing data for ethnicity.

Discussion
Using a near-complete population of New Zealand young people entering secondary school during the period 2013-2017, prevalence of mental health condition (6.8% for any type) was significantly associated with lower rates of attainment and higher rates of stand-downs and/or suspensions by school Year 11. These relationships endured after adjusting for a suite of sociodemographic factors. These findings are in line with a literature which studies links between mental health and educational outcomes [24][25][26]56 , across a range of data types and study designs 27,28 . This study provides timely evidence for the New Zealand context, where inequalities in mental health and education have been documented 6,21 , but there is less evidence studying the connection between these two important variables. The observed associations were especially pronounced among those young people exhibiting behavioural conditions, compared to emotional conditions. Previous studies have also documented negative outcomes associated with externalising conditions, such as ADHD, and highlighted a need for greater awareness of these conditions and additional support for schools in supporting these students 57 . The smaller association between emotional conditions may be due to heterogeneity within this category, as outcomes for those with mental health conditions are likely to vary depending on the specific type of condition, its symptoms and severity. Anxiety, for instance, can manifest via a diverse range of symptoms which may each associate differently with educational outcomes. Differential associations by type of mental health condition have been identified in previous literature, for instance, a recent study found that obsessive-compulsive disorder and anorexia nervosa were associated with higher grades in secondary school 29 . Further work examining outcomes associated with Table 4. Incidence rate ratios (IRR), with associated and 95% confidence intervals (CIs), for the relationship between mental health conditions and gaining NCEA Level 1 Certificate at Year 11 and stand-down/ suspensions, respectively. Data source is the NZ-IDI. Model 0 included indicators for school cohort; Model 1 included indicators for school cohort, child's sex, age-at-school-entry, ethnicity; Model 2 included indicators for school cohort, child's sex, age-at-school-entry, ethnicity, highest parental education level, household income bands, quintiles of area-deprivation and residential location. c-statistic is Harrell's c-statistic. The completecase samples for Models 1 and 2 were lower than the full sample in Model 0, due to 1.5% of the sample being missing for deprivation; 1.4% missing the urban/rural indicator; 0.01% missing data for ethnicity. www.nature.com/scientificreports/ specific mental health conditions may shed light on heterogeneity in outcomes for young people with mental health conditions. While measured effect sizes were at times large, and generally poorer for those identified as having a mental health condition, it must be emphasised that the vast majority of children gained a NCEA Level 1 Certificate and did not experience stand-downs or suspensions-regardless of their mental health status. Indeed, of those with any mental health condition, 65.0% attained NCEA Level 1 Certificate and 83.8% never experienced a standdown or suspension. Poor academic outcomes, as measured here using two sentinel indices, do not necessarily result from having a mental health condition. Māori and Pacific peoples were also under-represented among those gaining a NCEA Level 1 Certificate attainment, and over-represented among those experiencing standdown and suspension. Evidently there are inequities occurring for Māori and Pacific young people, however it is outside the scope of this study to discuss the socio-political experience of these groups. The investigation of the underlying determinants and potential solutions are best addressed from within those communities.
Methodologically, this study highlights the utility of administrative data for studying links between health and education. While sample survey data have many advantages, including the richness of variables collected, they also have potential to suffer from high rates of non-response, smaller sample sizes, recall and self-reporting bias, and can be relatively costly to collect and to maintain high response rates. For example, English Avon Longitudinal Study of Parents and Children birth cohort study (ALSPAC) cost an estimated £1 million per year to follow 14,000 families over 20 years 58,59 . The use of linked, routinely-collected records, may offer a lower-cost complement to sample-surveys and birth cohort studies to inform on the aetiology and consequences of mental health in childhood and adolescence. They also enable the development of contemporary, virtual or synthetic investigator defined cohorts and data which purposefully address the research question at hand 60 . Strengths and limitations. The use of linked administrative data is eminently suitable for contributing to the health and education nexus 61 , and the NZIDI in particular has several attractive properties for this study. First, the large sample sizes facilitate analyses of infrequent events (such as stand-downs and suspensions) and subgroup analyses. Second, the use of administrative records removes concern of differential self-report bias which is common in sample-surveys. Additionally, compared to other studies in this area, which typically use either service-use alone or an ICD code diagnosis 29 , we used a novel classification method for identifying probable mental health conditions among young people which combines information from use of multiple services, diagnostic codes and clinical judgement 37 . However, the study is not without limitations. While the employed mental health case classification approach overcomes several important concerns in estimating the prevalence of mental health conditions, the method does rely on secondary service use data, among which severe mental health conditions may be over-represented. Similarly, there may be undercounting of cases for those who do not seek medical care. The extent of any undercounting or false positive rate is currently unknown. Investigating this, potentially via means of comparison with analogous survey data, represents a fruitful direction for further research. Additionally, because we use official records of suspensions and stand-downs, any informal encouragement to keep a child away from school due to behavioural problems would not be captured in our data, potentially underestimating the true extent of suspensions 62 .
Further, as while this is a large prospective study, its observational nature, together with the likely unbalanced confounder pathways and patterns (both known and unknown) means that it is difficult to assert causality 63 . For example, early-life circumstance may impact on the probability of developing a mental health condition as well as shaping educational outcomes and attainment, thus generating a spurious correlation between these two variables. However, a comprehensive set of sociodemographic variables was used to gauge the confounding effects and attempt to mitigate this impact. Moreover, a link between mental health conditions and educational outcomes has also been observed across studies using alternative study designs which rely on different assumptions, such as Mendelian randomisation 27,28 , and family fixed effect models 9 , which provides supporting evidence for the veracity of this association. Still, even if these differences are not causal effects-that is, only due to mental health-or indeed more likely, some mix of correlation and causation, it remains important to document healthrelated inequalities in educational outcomes. Gaps in educational attainment highlight that there are young people with support needs which are not being met by the current system, and who are experiencing poorer outcomes at an important juncture in their school career. This was highlighted in a recent study of young people in New Zealand, which showed autistic students experienced significantly higher odds of suspension compared with their non-autistic peers, but this was mitigated by the receipt of high-need education-based funding 64 . Although those findings were for Autism Spectrum Disorder, rather than the mental health conditions considered in this paper, it highlights the potential of in-school support and funding as a means of reducing health-related educational inequalities.
Finally, in this study, differences between two groups were identified at one particular-and importantmoment of early adolescence. However, the links between education and mental health are complex, dynamic and bi-directional, and this study has not elucidated the precise mechanisms through which mental health may impact on later outcomes, or indeed the extent to suspensions may explain mental health's association with attainment. This represents an avenue for further work. The external validity of the findings will also be impacted by the COVID-19 crisis. In the post COVID-19 era, both mental health and educational outcome landscapes will likely have been altered in the medium-term, if not irrevocably. Only future studies conducted in this era will illuminate the extent of this impact, and inform on how policy and practice may need to change to accommodate the consequences of COVID-19.

Conclusions
Studying a near-complete population of New Zealand young people entering secondary school during the period 2013-2017, prevalence of mental health condition was significantly associated with lower rates of attainment and higher rates of stand-downs and/or suspensions by age 15-16 years. While mental health conditions increase the risk of poorer educational outcomes, deleterious outcomes are, however, not a necessary sequalae; here, most participants with mental health conditions had successful educational outcomes. Future investigations which explore and understand the different positive trajectories of adolescences with mental health conditions may provide pathways to reduce the educational attainment differences observed here and elsewhere.

Data availability
The datasets used for statistical analysis are held by Statistics New Zealand. These results are not official statistics. They have been created for research purposes from the Integrated Data Infrastructure (IDI) which is carefully managed by Stats NZ. For more information about the IDI please visit https:// www. stats. govt. nz/ integ rated-data/. The data that support the findings of this study are available from Statistics New Zealand, but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. The data and code used in this study are however available from the authors upon reasonable request and with permission of Statistics New Zealand (see: https:// www. stats. govt. nz/ integ rated-data/ apply-to-use-micro data-for-resea rch). The data can only be accessed by approved bone fide researchers, for projects that are in the public interest and within a secure accredited data lab.